Bibliography
195
[221] Chaofan Tao, Lu Hou, Wei Zhang, Lifeng Shang, Xin Jiang, Qun Liu, Ping Luo, and
Ngai Wong. Compression of generative pre-trained language models via quantization.
arXiv preprint arXiv:2203.10705, 2022.
[222] Jiayi Tian, Chao Fang, Haonan Wang, and Zhongfeng Wang. Bebert: Efficient and
robust binary ensemble bert. arXiv preprint arXiv:2210.15976, 2022.
[223] Naftali Tishby, Fernando C Pereira, and William Bialek. The information bottleneck
method. arXiv preprint physics/0004057, 2000.
[224] Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablay-
rolles, and Herv´e J´egou.
Training data-efficient image transformers & distillation
through attention. In International conference on machine learning, pages 10347–
10357. PMLR, 2021.
[225] VW-S Tseng, Sourav Bhattachara, Javier Fern´andez-Marqu´es, Milad Alizadeh,
Catherine Tong, and Nicholas D Lane. Deterministic binary filters for convolutional
neural networks. International Joint Conferences on Artificial Intelligence Organiza-
tion, 2018.
[226] Frederick Tung and Greg Mori. Similarity-preserving knowledge distillation. In Proc.
of ICCV, pages 1365–1374, 2019.
[227] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N
Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in
neural information processing systems, 30, 2017.
[228] Diwen Wan, Fumin Shen, Li Liu, Fan Zhu, Jie Qin, Ling Shao, and Heng Tao Shen.
Tbn: Convolutional neural network with ternary inputs and binary weights. In Pro-
ceedings of the European Conference on Computer Vision (ECCV), pages 315–332,
2018.
[229] Diwen Wan, Fumin Shen, Li Liu, Fan Zhu, Jie Qin, Ling Shao, and Heng Tao Shen.
Tbn: Convolutional neural network with ternary inputs and binary weights. In Pro-
ceedings of the European Conference on Computer Vision, pages 315–332, 2018.
[230] Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R
Bowman. Glue: A multi-task benchmark and analysis platform for natural language
understanding. arXiv preprint arXiv:1804.07461, 2018.
[231] Guo-Hua Wang, Yifan Ge, and Jianxin Wu. Distilling knowledge by mimicking fea-
tures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
[232] Jingya Wang, Xiatian Zhu, Shaogang Gong, and Wei Li. Transferable joint attribute-
identity deep learning for unsupervised person re-identification. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pages 2275–2284,
2018.
[233] Peisong Wang, Qinghao Hu, Yifan Zhang, Chunjie Zhang, Yang Liu, and Jian Cheng.
Two-step quantization for low-bit neural networks. In Proceedings of the IEEE Con-
ference on computer vision and pattern recognition, pages 4376–4384, 2018.
[234] Song Wang, Dongchun Ren, Li Chen, Wei Fan, Jun Sun, and Satoshi Naoi.
On
study of the binarized deep neural network for image classification. arXiv preprint
arXiv:1602.07373, 2016.